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Crowd counting in images has gained a huge 
attention due to it's intense demand. Mere 
counting doesn't give much information 
about all the crowd features. We envision a 
new problem of categorization in crowd 
counting which is extremely challenging due 
to heavy occlusion, perspective distortion, 
complex background and zero visibility of 
lower body part. 



□ Detecting and categorizing each person 
present in images as sitting or standing. 

□ Designing a system that would perform 
well in diverse environments. 

□ Finding out the total count of sitting and 
standing people in given image. 


Stage 1: Feature Extraction by Pose 

Estimation 

□ Detect and extract skeletal information of 
the people present using a pose 
estimator(Faster RCNN + SPPE)[1]. 

□ Challenge: Noisy data in occluded and 
crowded environments. 



Stage 2: Classification Using a Feed¬ 
forward Neural Network 
Channel the body joint key points of each 
detected person through a feedforward neural 
network. 



□ We introduce a new dataset of 360 images 
with different person densities covering 
various environments. 


□ To measure classification accuracy, we use 
traditional metrics (Precision, Recall, 
Accuracy) 
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□ To evaluate counting performance, we 


adopt standard 
RMSE) 

counting 

metrics (MAE, 

Mean Absolute Error (MAE) 

Density 

Before PP 

After PP 

Low 

0.67 

0.62 

Medium 

1.79 

1.74 

High 

4.05 

3.63 



Sit Count:17 
Stand Count:6 


Motivation 


□ Categorization based on body posture 
especially count of sitting and standing 
people can add a new dimension in 
providing different services. 

□ Existing approaches lose local visual 
information, making it impossible to 
categorize the people present in the 
image. 
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Stage 3: Post Processing Using Linear 

Regression 

□ A weighted linear regressor is fitted with 
the detected head locations and baseline 
labels. 

□ Re-estimate label of person using this 
revised decision boundary. 


Root Mean Square Error (RMSE) 

Density 

Before PP 

After PP 

Low 

1.17 

1.15 

Medium 

3.08 

2.65 

High 

5.55 

5.48 


6 Conclusion 


□ Our model works well particularly in 
sparse crowd, which is more prevalent in 
environments where the system is more 
pertinent. 

□ Traditional crowd counting methods 
struggle in low density environments 
because of overestimation. 

□ To utilize the spatial information more, 
usage of an LSTM based RNN network in 
our classification stage can be interesting. 

□ Further types of categorization in crowd 
counting could extract more useful crowd 
characteristics. 
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